Overview of the Author Identification Task at PAN-2017: Style Breach Detection and Author Clustering
نویسندگان
چکیده
Several authorship analysis tasks require the decomposition of a multiauthored text into its authorial components. In this regard two basic prerequisites need to be addressed: (1) style breach detection, i.e., the segmenting of a text into stylistically homogeneous parts, and (2) author clustering, i.e., the grouping of paragraph-length texts by authorship. In the current edition of PAN we focus on these two unsupervised authorship analysis tasks and provide both benchmark data and an evaluation framework to compare different approaches. We received three submissions for the style breach detection task and six submissions for the author clustering task; we analyze the submissions with different baselines while highlighting their strengths and weaknesses.
منابع مشابه
OPI-JSA at CLEF 2017: Author Clustering and Style Breach Detection
In this paper, we propose methods for author identification task dividing into author clustering and style breach detection. Our solution to the first problem consists of locality-sensitive hashing based clustering of real-valued vectors, which are mixtures of stylometric features and bag of n-grams. For the second problem, we propose a statistical approach based on some different tf-idf featur...
متن کاملStyle Breach Detection with Neural Sentence Embeddings
The paper investigates method for the style breach detection task. We developed a method based on mapping sentences into high dimensional vector space. Each sentence vector depends on the previous and next sentence vectors. As main architecture for this mapping we use the pre-trained encoder-decoder model. Then we use these vectors for constructing an author style function and detecting outlier...
متن کاملStyle Breach Detection: An Unsupervised Detection Model
This paper deals with the sub-task of PAN 2017 Author Identification, which is to detect style breaches for unknown number of authors within a single document in English. The presented model is an unsupervised approach that will detect style breaches and mark text boundaries on the basis of different stylistic features. This model will use some classical stylistic features like POS analysis and...
متن کاملOverview of the 5th Author Profiling Task at PAN 2017: Gender and Language Variety Identification in Twitter
This overview presents the framework and the results of the Author Profiling task at PAN 2017. The objective of this year is to address gender and language variety identification. For this purpose a corpus from Twitter has been provided for four different languages: Arabic, English, Portuguese, and Spanish. Altogether, the approaches of 22 participants are evaluated.
متن کاملOverview of PAN'16 - New Challenges for Authorship Analysis: Cross-Genre Profiling, Clustering, Diarization, and Obfuscation
This paper presents an overview of the PAN/CLEF evaluation lab. During the last decade, PAN has been established as the main forum of digital text forensic research. PAN 2016 comprises three shared tasks: (i) author identification, addressing author clustering and diarization (or intrinsic plagiarism detection); (ii) author profiling, addressing age and gender prediction from a crossgenre persp...
متن کامل